This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

First of all, We are going to source these packages below

require("jsonlite")
## Loading required package: jsonlite
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:utils':
## 
##     View
require("RCurl")
## Loading required package: RCurl
## Loading required package: bitops
require("dplyr")
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require("ggplot2")
## Loading required package: ggplot2
require("knitr")
## Loading required package: knitr
require("dplyr")
require("extrafont")
## Loading required package: extrafont
## Registering fonts with R

Next, we modified the R_ETL.R file and use it to create a table on SQL server. The modified versions are as follows:

The tabel we get is as follows:

UK Car Fuel Consumption and Emissions’s R_ETL.R file:


CREATE TABLE carconsump ( – Change table_name to the table name you want. year varchar2(4000), manufacturer varchar2(4000), model varchar2(4000), engine varchar2(4000), transmission varchar2(4000), transmission_type varchar2(4000), fuel_type varchar2(4000), X varchar2(4000), X_1 varchar2(4000), X_2 varchar2(4000), X_3 varchar2(4000), X_4 varchar2(4000), X_5 varchar2(4000), X_6 varchar2(4000), X_7 varchar2(4000), X_8 varchar2(4000), X_9 varchar2(4000), euro_standard number(38,4), engine_capacity number(38,4), urban_metric number(38,4), extra_urban_metric number(38,4), combined_metric number(38,4), urban_imperial number(38,4), extra_urban_imperial number(38,4), combined_imperial number(38,4), noise_level number(38,4), co2 number(38,4), co_emissions number(38,4), fuel_cost_6000_miles number(38,4) );

Understanding Tableau Data Extract

A Tableau data extract is a compressed snapshot of data stored on disk and loaded into memory as required to render a Tableau viz.There are two aspects of TDE design that make them ideal for supporting analytics and data discovery.

The first is that a TDE is a columnar store. Columnar databases store column values together rather than row values, and as a result, they dramatically reduce the input/output required to access and aggregate the values in a column.

##The second key aspect of TDE design is how they are structured which impacts how they are loaded into memory and used by Tableau.TDEs use all parts of your computer’s memory, from RAM to hard disk, and put each part to work as best fits its characteristics. To better understand, I tried to use connect to our data UK Car Fuel Consumptions and Emissions with an Extract API script in python, where I explored and walked through how a TDE is created from a CSV and then later used in our project as the data source for one or more visualizations. ##When Tableau creates a data extract, it first defines the structure for the TDE and creates separate files for each column in the underlying source.To complete the creation of a TDE, individual column files are combined with metadata to form a memory-mapped file. Because a TDE is a memory-mapped file, when Tableau requests data from a TDE, the data is loaded directly into memory by the operating system. Tableau doesn’t have to open, process or decompress the TDE to start using it. If necessary, the operating system continues to move data in and out of RAM to insure that all of the requested data is made available to Tableau. This is a key point - it means that Tableau can query data that is bigger than the available RAM on a machine!

Experiment Tableau Data Extract

I extracted the first 5000 thousand rows of our dataset, carconsump.reformatted.csv of total of 10,000 rows and save it as a .tde called carconsump.reformatted_5000Rows.tde. So I created a TDE from a CSV. Follows are the procedures to create a TDE:

For further information on TDE, check

Tableau Understanding Data Extracts

Exploring Tableau Data Visualizations!

Non-Aggregated Measures Analysis(start with a green thing) - also demonstrates Boxplots, Dates, and Pages

Boxplot Story (start with a green thing)

1. Open UK Car Consumption and Emissions. tde

2. Uncheck Aggregate Measures under the Analysis tab and Rename the Current Sheet to Co2 Emissions Boxplot.

3. Click on Co2 Under Measures and then click on the boxplot icon on “Show Me” i.e., the right icon on row 7.

4. Drag Fuel Type onto Columns.

5. Drag Manufacturer onto the Color Shelf.

6. Drag Manufacturer onto the Filters Shelf and from this Pill’s menu, select Show Quick Filter.

7. Drag Engine onto Detail

8. Drag Manufacturer onto Pages

9. Filtered out the top 20 luxury cars

10. Click through the luxury car brands on Pages

Ok brands

###The main fuel type used by this group of cars with price rage 30,000 - 150,000 are Diesel and Petrol. If using Deisel, Mercedez-Benz, Audi, BMW, Jaguar and all except Land Rover has mean emissions between 150 - 200 grammes per kilometre (g/km). If using Petrol, Mercedez-Benz, Audi, BMW have average near 200 grammes per kilometre (g/km). The worst performing manufacturer is Land Rover which has mean Co2 emissions up to 350 grammes per kilometre (g/km).

Real luxury brads

###Interestingly, top luxury car manufacturers such as Maserati, Lamborghini, Rolls-Royce, Ferrari, and Aston Martin Lagonda can only use one fuel type Petrol. Highest mean Co2 emission – Lamborghini up to 600 (g/km). ###If we plot out the boxplot of luxury cars’ Co2 emissions and Noise level, we see that the avaerage of Co2 emissions is around 211 g/km, and average of Noise level is up to 71 measured on the A scale of a noise meter (dB (A)). ###Why high emissions and high noise level for luxury cars? I can’t quite imagine a sports car with a V12 engine and maximum speed of 350kph having an emissions/ noise level bypass. Fortunately not everyone can afford one. Good news to our Earth!

Aggregated Measures Analysis(start with a green thing) - also demonstrates Histograms, Dual-axis Plots and Show Me

Histogram Story (start with a green thing)

1. Open UK Car Consumption and Emission Extract.tde

2. Rename Sheet 1 to Noise Level Histogram

3. Drag Engine to Column, and Noise Level to Row

4. Take Average as the Measure of Noise Level

5. Drag Noise Level to Filter and click Show Quick Filter

6. Adjust filter to find the Engine gives highest average of noise level

7. Adjust filter to find the Engine that gives lowest average of noise

##Upper 70s (dB (A)) are annoyingly loud to some people. So next I compare the Average of the all the Noise made by these engines to the threshold 70s (dB (A)). ##8. Go to Analytics and click Reference Line, choose Line Only. Click on Value and choose create parameter. Created a parameter called “Annoyingly Loud”, and click OK. And then go to Analytics and click Average Line. ##Clearly the Average noise made by all these years from 2000 to 2013 have surpassed the standards of human comfort.